Text Classification Method for Data Cleaning
نویسندگان
چکیده
منابع مشابه
Training Data Cleaning for Text Classification
In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain; strategies are thus needed for maximizing the effectiveness of the resulting classifiers while minimizing the required amount of training effort. Training data cleaning (TDC) consists in devising ranking functions that sort the original training examples in terms of how...
متن کاملData Cleaning for Classification Using Misclassification Analysis
In most classification problems, sometimes in order to achieve better results, data cleaning is used as a preprocessing technique. The purpose of data cleaning is to remove noise, inconsistent data and errors in the training data. This should enable the use of a better and representative data set to develop a reliable classification model. In most classification models, unclean data could somet...
متن کاملClassification Method for Shared Information on Twitter Without Text Data
During a disaster, appropriate information must be collected. For example, victims and survivors require information about shelter locations and dangerous points or advice about protecting themselves. Rescuers need information about the details of volunteer activities and supplies, especially potential shortages. However, collecting such localized information is di cult from such mass media as ...
متن کاملA Novel One Sided Feature Selection Method for Imbalanced Text Classification
The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...
متن کاملA Random Walks Method for Text Classification
Practical text classification system should be able to utilize information from both expensive labelled documents and large volumes of cheap unlabelled documents. It should also easily deal with newly input samples. In this paper, we propose a random walks method for text classification, in which the classification problem is formulated as solving the absorption probabilities of Markov random w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IOSR Journal of Computer Engineering
سال: 2012
ISSN: 2278-8727,2278-0661
DOI: 10.9790/0661-0754554